Overview

Our LUR sampling scheme involved collocated sampling at three EPA monitoring sites in the Denver metropolitan area. Two of these sites (Globeville and National Jewish) only have PM2.5 monitoring data. However, the near-road monitoring site at I-25 has both PM2.5 and black carbon.

I’ve been using the monitoring data from the I-25 site to calibrate the UPAS PM2.5 and BC data. The PM2.5 calibration model was decent (adjusted R2 ~ 60%), whereas the calibration curve for BC (using only the UPAS measurement to predict the monitor measurement) was terrible (a negative R2). I also fit calibration models for PM2.5 using the Globeville and National Jewish models. Globeville performed similarly.

The fit for the BC data across all campaigns was pretty poor. There didn’t appear to be any correlation between the monitor data and the UPAS data. However, once the data were stratified by campaign, there were some obvious relationships. For some reason, the UPAS monitors performed differently based on the campaign. Christian thinks ambient temperature might be an issue (e.g., there may be discrepancies between the actual sampled volume and the recorded sampling volume due to differences in density at cold temperature or the UPAS might have been prone to leaks).

One option address this fit issue was to fit separate calibration curves for each campaign. However, we did not have collocated data for Campaign 1 (spring) because we weren’t able to arrange access to the I-25 site before the summer. Therefore, my strategy for fitting a calibration model was to use indicator variables for campaign, with Campaign 2 (and by default, Campaign 1) serving as the reference group.

Interestingly, the model with UPAS measurement, the indicator variables for campaign, and monitoring site temperature were not that successful at predicting monitoring site BC. This model performed similarly to the model with just UPAS measurement. What dramatically improved model fit was using the monitoring site PM2.5 measurements in addition to temperature and campaign.

The selected BC model used UPAS BC, temperature, temperature2, and monitoring site PM2.5. The adjusted R2 for this model was 0.65.

After fitting the BC model, I went back and tested the PM2.5 model to see if adding indicator variables for campaign and temperature could improve model fit. I did see a slight improvement, so I updated the calibration model.

The final PM2.5 model uses UPAS PM2.5 and indicator variables for campaigns 3 and 4 (campaign 2 is the reference group.) The adjusted R2 for this model was 0.66.

Below I have outlined the calibration processes for both PM2.5 and BC.

One outstanding issue will be to identify why there were differences by campaign. This might require some further investigation. We might want to have the UPAS checked out (e.g., check for leaks, check to see if the flow is still calibrated correctly, etc.)

PM2.5 Calibration

First, we identified all of the UPAS filters co-located with the I-25 monitor. Over the course of the four sampling campaigns, we had 56 filters co-located with the three PM2.5 monitors (17 at the I-25 site). The raw PM2.5 time-weighted average concentrations from the UPAS filters at the I-25 site averaged 10.1 \(\mu\)g/m\(^3\) and had a range of 2.3 to 19.6 \(\mu\)g/m\(^3\).

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.290   5.663  10.138   9.678  12.193  19.619

Temporal trends for the UPAS filter results were as expected. Concentrations were highest in the winter (Campaign 4) and summer (Campaign 2) due to increases in biomass burning and lower in the spring and fall (Campaigns 1 and 3, respectively). On average, concentrations were higher at the I-25 and Globeville sites compared to the National Jewish site. This result was expected, as the I-25 and Globeville sites are near major highways.

Figure 1. UPAS PM~2.5~ Time series

Figure 1. UPAS PM2.5 Time series

The EPA monitoring data showed similar patterns to the UPAS filter data time series. There was a spike in PM2.5 in September of 2018, likely due to nearby wildfires.

Figure 2. UPAS PM~2.5~ Time series

Figure 2. UPAS PM2.5 Time series

We used each monitor in a separate linear regression model to identify the best fit to calibrate the full UPAS filter data set. Overall, there was modest agreement between the UPAS and EPA monitor results. The best overall fit was for the I-25 monitor (adjusted R2 = 0.61). Diagnostic plots showed that the linear model fit reasonably well and was appropriate.

Figure 3. Scatter plots for PM2.5 (UPAS vs EPA)

Figure 3. Scatter plots for PM2.5 (UPAS vs EPA)

The diagnostic plots for the linear regression model using only the UPAS PM2.5 as a predictor showed that the model fit reasonably well.

## 
## Call:
## lm(formula = monitor_mean ~ pm_ug_m3, data = filter(cal_data, 
##     monitor_id == "080310027"))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9362 -2.0823  0.3269  1.6789  4.9291 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.4876     1.4950   3.002 0.008941 ** 
## pm_ug_m3      0.6982     0.1371   5.091 0.000133 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.836 on 15 degrees of freedom
## Multiple R-squared:  0.6334, Adjusted R-squared:  0.609 
## F-statistic: 25.92 on 1 and 15 DF,  p-value: 0.0001328

After fitting the BC model, however, I wanted to examine potential effects of temperature and campaign. I included indicator variables for campaigns 3 and 4 (with campaign 2 serving as the reference group) and explored different forms of the temperature variable (e.g., temp, temp2). The final model was selected based on AIC, adjusted R2, and on the distribution of model residuals. The final model uses UPAS PM2.5 and campaign as predictors and has an adjusted R2 of 0.66. The mean bias and RMSE for this model were 1.6 \(\mu\)g/m3 and 0.6, respectively.

## 
## Call:
## lm(formula = monitor_mean ~ pm_ug_m3 + camp3 + camp4, data = cal_data3)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.471 -1.612  0.093  1.424  4.045 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.03176    2.84201   0.011 0.991253    
## pm_ug_m3     0.89505    0.18719   4.781 0.000359 ***
## camp3        3.87557    2.30212   1.683 0.116127    
## camp4        3.35206    1.68321   1.991 0.067869 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.636 on 13 degrees of freedom
## Multiple R-squared:  0.7256, Adjusted R-squared:  0.6623 
## F-statistic: 11.46 on 3 and 13 DF,  p-value: 0.0005928

We used the linear regression model based on the I-25 monitoring data to calibrate the remaining filters. We then used box plots to compare raw and calibrated UPAS PM2.5 results to all EPA monitors in Adams, Arapaho, and Denver counties. These plots are shown below. Overall, there appears to be reasonable agreement between what we measured using our UPAS monitors and general trends across the region.

Figure 4. Box plots of PM~2.5~ by month

Figure 4. Box plots of PM2.5 by month

Figure 5. Box plots of PM~2.5~ by season

Figure 5. Box plots of PM2.5 by season

BC Calibration

The BC calibration process was originally less successful than the PM2.5 calibration. Again, we are using aethalometer data from the I-25 site because that is the only location in the Denver area available. UPAS BC results were similar whether blank-corrected or not, so we elected to use the blank-corrected BC concentrations. UPAS BC concentrations averaged 3.2 \(\mu\)g/m3 and ranged from 0.3 to 5.8 \(\mu\)g/m3.

Unlike the PM2.5 data at this site, temporal trends for the UPAS filter results were unexpected. Our UPAS data showed declining BC concentrations at the I-25 site, with the lowest concentrations occurring in the winter. The time-series plot suggested that there may be differences in how BC was measured by campaign.

Figure 6. UPAS BC Time series

Figure 6. UPAS BC Time series

The EPA monitoring data were highly variable, but the I-25 site showed lower concentrations in the spring and level concentrations across the summer, fall, and winter seasons, likely attributable to consistent traffic and the influence of biomass burning in the summer and winter.

Figure 7. UPAS BC Time series

Figure 7. UPAS BC Time series

To be thorough, I wanted to compare the CDPHE BC data for the campaign periods to the EPA campaign periods to see if there was a difference how CDPHE processed their data. Comparisons between the raw CDPHE data received from Brad Rink and the BC data from the EPA website suggested near perfect agreement, so there was no issue using the EPA data.

Scatter plot of CDPHE vs. EPA suggests near perfect agreement.
Figure 8. Scatterplot of CDPHE vs. EPA data

Figure 8. Scatterplot of CDPHE vs. EPA data

Time series plots for CDPHE and EPA also suggest that the issue is with our data rather than the EPA or CDPHE data.
Figure 8. Time series of CDPHE and EPA BC data

Figure 8. Time series of CDPHE and EPA BC data

Christian suggested that there might be differences in the metals sampled across seasons that could interfere with the transmissometry data. A quick summary of the S/K ratios measured for the filters suggests that the black carbon at this site is primarily from traffic (all ratios are above 1, but the ratio varies with time). S is associated with fossil fuel combustion, whereas K is more strongly linked to biomass burning. The S/K ratio averages 3.4, but ranges from 1.4 to 8.6 (IQR = 2.5). A time series plot suggests that S/K ratios for this near-road monitor are highest in the winter.

Figure 9. S/K ratios for I-25 site filters

Figure 9. S/K ratios for I-25 site filters

The Yuma St. monitor doesn’t report metals data, but the Navajo St. site (3.3 miles away from the Yuma St. site) does. The S/K ratio at this site averages 2.6 and ranges from 0 to 26.8 (IQR = 2.3). These ratios tend to be much more consistent over time compared to the Yuma St. filter data, but this is not a near-road site.

Figure 10. S/K ratios for the nearby Navajo St. monitoring site

Figure 10. S/K ratios for the nearby Navajo St. monitoring site

When we compared the UPAS TWA BC concentrations to the EPA monitoring data, we saw pretty poor fit (Figure 11). Once stratified by campaign, however, it became evident that there were differences in each campaign (Figure 12).

Figure 11. Scatter plots for BC (UPAS vs EPA)

Figure 11. Scatter plots for BC (UPAS vs EPA)

Figure 12. Scatter plots for BC (UPAS vs EPA) stratified by campaign

Figure 12. Scatter plots for BC (UPAS vs EPA) stratified by campaign

Our regression model to fit the EPA monitoring data with the UPAS data was very poor. The correlation between UPAS BC and monitor BC was 0.18. The linear regression resulted in a negative adjusted R2 value (-0.01), though diagnostic plots showed that the linear model fit was appropriate.

## [1] 0.1884562

## 
## Call:
## lm(formula = monitor_mean ~ bc_ug_m3, data = cal_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46984 -0.23134 -0.05815  0.27305  0.68995 
## 
## Coefficients:
##             Estimate Std. Error t value   Pr(>|t|)    
## (Intercept)  1.66759    0.21637   7.707 0.00000136 ***
## bc_ug_m3     0.04490    0.06042   0.743      0.469    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3749 on 15 degrees of freedom
## Multiple R-squared:  0.03552,    Adjusted R-squared:  -0.02878 
## F-statistic: 0.5524 on 1 and 15 DF,  p-value: 0.4688

To try to improve model fit, we first tried to model each campaign separately. However, this approach would not work for the full data set because we only had co-located data for three of the four campaigns. Instead, we included indicator variables for campaign (with campaign 2 serving as the reference group) and the temperature measured at the I-25 site. These additions only moderately improved the model fit.

## 
## Call:
## lm(formula = monitor_mean ~ bc_ug_m3 + monitor_temp + monitor_temp_sq + 
##     camp3 + camp4, data = cal_data3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.48845 -0.20808 -0.09186  0.21122  0.59641 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)  
## (Intercept)      1.4268387  1.5455631   0.923   0.3757  
## bc_ug_m3         0.0846877  0.1179496   0.718   0.4877  
## monitor_temp    -0.0809352  0.0641507  -1.262   0.2332  
## monitor_temp_sq  0.0010958  0.0007369   1.487   0.1651  
## camp3            1.4215191  0.8081500   1.759   0.1063  
## camp4            1.6710969  0.9045827   1.847   0.0917 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.37 on 11 degrees of freedom
## Multiple R-squared:  0.3111, Adjusted R-squared:  -0.002005 
## F-statistic: 0.9936 on 5 and 11 DF,  p-value: 0.4646

What ended up having the effect on model fit was including the PM2.5 measurements at the I-25 site. Our final model included the UPAS BC measurement, monitor PM2.5, monitor temperature, monitor temperature squared, and the indicator variables for campaign (with campaign 2 as the reference group). This final model had an adjusted R2 of 0.65, a mean bias of -1.4 \(\mu\)g/m3 and an RMSE of 0.5.

## 
## Call:
## lm(formula = monitor_mean ~ bc_ug_m3 + monitor_temp + monitor_temp_sq + 
##     monitor_pm + camp3 + camp4, data = cal_data3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.36583 -0.14255  0.04323  0.10487  0.31271 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.4010193  0.9428983   0.425 0.679622    
## bc_ug_m3        -0.0846296  0.0789703  -1.072 0.309053    
## monitor_temp    -0.0640306  0.0382091  -1.676 0.124714    
## monitor_temp_sq  0.0010266  0.0004371   2.349 0.040743 *  
## monitor_pm       0.0793305  0.0171911   4.615 0.000958 ***
## camp3            1.7379496  0.4840106   3.591 0.004923 ** 
## camp4            1.4989189  0.5375970   2.788 0.019179 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2194 on 10 degrees of freedom
## Multiple R-squared:  0.7799, Adjusted R-squared:  0.6478 
## F-statistic: 5.905 on 6 and 10 DF,  p-value: 0.007247

After using this model to calibrate the full filter data set, we used box plots to compare raw and calibrated UPAS BC results to the EPA monitor at the I-25 site. These plots are shown below. The calibrated data performed reasonably well (as indicated by the median values), however, we lose most of the spatial variability we observed in the raw data. We’ll need to discuss next steps with our colleagues in engineering and chemistry.

Figure 13. Box plots of raw and calibrated BC by month

Figure 13. Box plots of raw and calibrated BC by month

Figure 14. Box plots of raw and calibrated BC by season

Figure 14. Box plots of raw and calibrated BC by season